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Abstract 

We prove a general lower bound of quantum decision tree complexity in terms of some 
entropy notion. We regard decision tree computation as a communication process in 
which the oracle and the computer exchange several rounds of messages, each round 
consisting of O(logn) bits. Let E(f) be the Shannon entropy of the random variable 
f(X), where X is taken uniformly random in /'s domain. Our main result is that it 
takes £l(E(f)) queries to compute any total function /. It is interesting to contrast 
this bound with the £l(E(f)/ logn) bound, which is tight for partial functions. Our 
approach is the polynomial method. 

keywords: Quantum computation; Decision tree; Lower bounds; Computational com- 
plexity; Entropy 

1 Introduction 



The decision tree model is probably the simplest model in the study of computational 
complexity. In this model, the input x is known only to an oracle, and the only way that 
the computer can access the input is to ask the oracle questions of the type =?'. The 
computational cost is simply the number of such queries, and the complexity of a problem 
is the minimal worst case cost. For example, to find out whether or not there is a 1 in 
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xi, Xii ■ • • , x n , any deterministic decision tree algorithm needs to ask for all the x^s in the 
worst case. Therefore, its (deterministic) decision tree complexity is n. 

Unlike classical decision trees, a quantum decision tree algorithm can make queries 
in a quantum superposition, and therefore may be intrinsically faster than any classical 
algorithm. For example, Grover's quantum algorithm^ for finding the location of the 
only 1 in an n bit string makes only 0(\/n) queries, while any classical algorithm needs 
fl(n) queries. In recent years, the quantum decision tree model has been extensively studied 
by many authors from both upper bounds and lower bounds perspectives. Here we consider 
the latter aspect only. is a recent survey of both classical and quantum decision tree 
complexity. 

Throughout this paper, / denotes a function: 



for some integers n, m > 0. Let Q(f) be the quantum decision tree complexity of / with 
error probability bounded by 1/3. Our goal is to derive a general lower bound for Q(f) in 
terms of E(f) defined as follows: 

Definition 1.1 For any f, define the entropy of f, E(f), to be the Shannon entropy of 
f(X), where X is taken uniformly random from A. More explicitly, 



where p y = Pr xeRA [f(x) = y\. 

We first note the following general lower bound: 

Proposition 1.2 For any f , Q(f) = £l(E(f)/ log n) . 

This fact can be proved by a standard information theoretical argument, which we 
sketch here. The computation can be viewed as a process of communication: to make a 
query, the algorithm sends the oracle |~log 2 n] + 1 bits, which are then returned by the 
oracle. The first |~log 2 n~\ bits specify the location of the input bit being queried and the 
remaining one bit allows the oracle to write down the answer. Now we run the algorithm 
on — - j= J2x&a \ x )x\ )y, where X and Y denote the qubits that hold the input and the 



intermediate results of the computer respectively. Now we consider S B , the von Neumann 
entropy of qubits in Y after the tth query. If the algorithm computes / in T queries, at 



/ : {0, 1}M^B = f(A) C {0, l} m , 



E(f) = X>>§2 




the end of the computation, we expect to have a vector close to 
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Clearly = 0, S ( P « E(f), and - £$] = O(logn) for any t, < t < T - 1. 

The latter two assertions can be proved by standard applications of Holevo's theorem 0. 
Therefore T = Q(E(f)/ logn). We will provide an example later to show that indeed this 
bound is tight. This means one quantum query can get logn bits of information, while 
any classical query can only get no more than 1 bit of information. 

Surprisingly, this power of getting uj(1) bits of information in a query is not useful in 
computing total functions, i.e., functions that are defined on every string in {0, l} n , in the 
sense that each quantum query can only get 0(1) bits of information on average, as stated 
in our main theorem: 

Theorem 1.3 (Main Theorem) For any total function f, Q(f) = Q(E(f)). 

Now we sketch the proof idea. We take the polynomial approach initiated in ||. Any 
correct algorithm that computes / will produce a set of polynomials 

{/,:{0,ir^l:^B}. 

Each f y is an approximation to the characteristic polynomial for f~ l (y)- If / is a total 
function, on any Boolean inputs, f y is forced to be close to either or 1, and this 'take- 
it-or- leave- it' nature makes it harder to approximate; in contrast, when / is not a total 
function, on inputs where / is not defined, f y has more freedom to take values that make 
the approximation easier. 

There are several previous papers that prove general lower bounds on quantum decision 
tree complexity in terms of different complexity notions: |2j by Boolean (block) sensitivity 
and by degree of approximating polynomials, by a combinatorial property, and || by 
average Boolean sensitivity. 

In the next two sections we shall provide a rigorous definition of the quantum decision 
tree model and then prove the main theorem. 

2 Quantum decision tree model 

In the quantum decision tree model, the computer has three sets of qubits: P, Q, and R. 
P has n bits, which hold the input; Q has |~log 2 n] + 1 bits, which contain a pointer to the 
input bits (i.e., an integer between 1 and n), as well as one more bit; R has an unlimited 
number of bits which serve as the algorithm's working space. A quantum decision tree 
computation with input x is the application (from the right to the left) of a sequence of 
unitary operators 

A := UtOUt^O ■ ■ ■ UxOUq 
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on the initial state 

\x)p\~^)qr, 

where O is the oracle gate: 

0\x) P \i, b) Q \c) R = \x) P \i, b © Xi) Q \c) R , 

and each U t = I <%>U t , < t < T , where / is the identity operator on h(P) and U t a unitary 
operator on 1 2 {QUR). We say that the algorithm computes / (with error bounded by 1/3) 
if there exists a measurement M on ^(QUR), such that for any x G A, with probability no 
less than 2/3 f(x) will be observed by applying M at the final state of the computation. 
The quantum decision tree complexity Q{f) is defined to be the minimal T such that there 
is a quantum decision tree algorithm that computes / in T queries. 

The following example demonstrates that the lower bound in Proposition |1.2| is tight. 

Example 1 Assume n is a power of 2. For any z G {0, \y°^ n ; e ( z ^ g jo, l} n is defined 
as follows: e(z)i = i ■ z (parity of bitwise product). Consider f(x) := z if x = e(z), 
otherwise f is undefined. Then E(f) = \og 2 n, while Q(f) = 1. Let H be the Hadamard 
transformation on the log 2 n index bits in Q, and M acts on the last bit in Q such that 
M\0) = 77|(|0) — |1)). It is easy to verify that for any x = e(z), 

M- 1 HOHM\x) P \t) Q = \x) P \z,0) Q . 

3 Proof of the main theorem 

For < t < T, let 4>t{x) G 1%(Q\J R) be the state such that 

UtOUt-x ■ ■ ■ U \x) P \ t) QR = \x) P ® <f> t (x). 

Let ^/ be any orthonormal basis for 1 2 {Q U R). Our proof will finally make use of the 
following fact observed in ||: 

Fact 1 

for some set of multi-linear polynomials p^(x), each of which is of degree no more than t. 
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Therefore, proving lower bounds in quantum complexity can be reduced to proving 
lower bounds on the degree of approximating polynomials. We shall first prove some 
lemmas on the latter. 

For any g : {0, l} n — > R, define the average sensitivity of g, 

s g = E x , t [(g(x) - g(x + e;)) 2 ] and, 
P 9 = E x \g(x)} . 

All randomness is uniform. When g is a Boolean function, s g is just the probability that 
a random edge in the Boolean cube connects two vertices of different function values, and 
p g is the probability for a random input to have function value 1. 

Now let g be a Boolean function, and g : {0, l} n — > [0, 1] approximate g, i.e., \g(x) — 
g(x)\ < 1/3 for all x G {0, 1}™. The following theorem says that a larger s g or a smaller p g 
will force g to have high degree. 

Lemma 3.1 deg(g) > ns g /(Ap g ) > ns g /(3Qp g ). 

Proof. Let d = deg(g), then the Fourier representation of g is 

re{0,l}",\r\<d 

where g r = E x [g(x)(—l) x ' r ]. By simple calculation, 

9 r — < ( E ^r)— = E * [g (xj\ ^d/n < Adp g /n. 

r,\r\<d r,\r\<d 

Since g approximates g, s g > |s 9 .« 

The following lemma about Boolean functions will be needed immediately: 

Lemma 3.2 Let k be the cardinality of X C {0, l} n , t x the number of edges in the Boolean 
cube that connect two vertices in X . Then t x < &log 2 k/2. 

Proof. By induction. It is true for k = 1,2. Assume the statement is true for all 
natural numbers smaller than k, and let's examine the case k > 3. Pick a coordinate % 
such that both the subcubes of Xi — 1 and Xi = have nonempty subsets A and B of 
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X. Then t x < + + min{|y4|, |-B|}. We can assume without loss of generality that 
1 < | -A | = a < Then by simple calculation, 

< 2° 1°S2 a + -(^ — a ) 1°S2 (k — a) + a < k log 2 fc/2. 

■ 

Let H(-) be the entropy function, i.e., for r\ G [0, 1], H{rj) := r/log 2 - + (1 — rj) log 2 j^. 
The following lemma says that if the number of true assignments is close to the number of 
false assignments, then the Boolean function should have high average sensitivity: 

Lemma 3.3 For any Boolean function g, s g > H(p g )/n. 



Proof. Let k = 2 n p g be the number of true assignments. By Lemma |3.2| , in the Boolean 
cube, the number of edges that connect two true assignments is less than k\og 2 k/2, and 
the number of edges that connect two false assignments is less than (2 n — k) log 2 (2 n — k)/2. 
Therefore, 

s g = Pr [g(x) ^ g(x + e^)] > (n2 n - Hog 2 k - (n - k) log 2 (n - k)) jn2 n = H(p g )/n. 



We are now ready to prove our main theorem: 

Proof. [Main Theorem] For each y G B, let f y be the characteristic function of f^ 1 {y)- l 
i.e., 

1 if f{x) = y, 
otherwise. 



fv( x ) 



Let f y (x) be the probability that y is observed as the output when the input is x. Then by 
Lemma [I], f y is a nonnegative polynomial of degree no more than 2Q(f), and f y approxi- 
mates f y . Furthermore, for any x, J2 y f y {x) < 1. 

For simplicity of notation, we shall use p y in place for pf y , p y for pj , and s y for Sf y . 
Note that 

E(f) = 5^Pvlog 2 — , 

y Vy 

and, 

Let d = max y deg(f y ). We want to get a lower bound for d. 
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n 



^Sy < dyPy < d^y. 



By Lemma |3.1| 



Summing over all i, and by Lemma we get 

d> — r s - > — v h( Pv ) > —E(f). 

- 36 ^ y ~ 36 ^ yyy ' ~ 36 W ' 
y y 
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